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Summary. We consider the generic problem of performing sequential Bayesian inference in a state- 
space model with observation process y, state process x and fixed parameter 9. An idealized approach 
would be to apply the iterated batch importance sampling (IBIS) algorithm of Chopin (2002). This is 
a sequential Monte Carlo algorithm in the 6i-dimension, that samples values of 6, reweights iteratively 
these values using the likelihood increments p{yt\yi:t-i,0), and rejuvenates the 6i-particles through 
a resampling step and a IVICMC update step. In state-space models these likelihood increments are 
intractable in most cases, but they may be unbiasedly estimated by a particle filter in the x-dimension, 
for any fixed 6. This motivates the SMC^ algorithm proposed in this article: a sequential Monte Carlo 
algorithm, defined in the 6'-dimension, which propagates and resamples many particle filters in the 
x-dimension. The filters in the x-dimension are an example of the random weight particle filter as in 
Fearnhead et al. (2010). On the other hand, the particle Markov chain Monte Carlo (PMCMC) frame- 
work developed in Andrieu et al. (2010) allows us to design appropriate MCMC rejuvenation steps. 
Thus, the 6'-particles target the correct posterior distribution at each iteration t, despite the intractabil- 
ity of the likelihood increments. We explore the applicability of our algorithm in both sequential and 
non-sequential applications and consider various degrees of freedom, as for example increasing dy- 
namically the number of s-particles. We contrast our approach to various competing methods, both 
conceptually and empirically through a detailed simulation study, included here and in a supplement, 
and based on particularly challenging examples. 

Keywords: Iterated batch importance sampling; Particle filtering; Particle Markov chain Monte Carlo; 
Sequential Monte Carlo; State-space models 



1. Introduction 

1.1. Objectives 

We consider a generic state-space model, with parameters 6 € Q, prior p{0), latent Markov process 
(xt), p{xi\0) = ne{xi), 

p{xt+i\xi;t,9) = p{xt+i\xt,9) = fe{xt+i\xt), t > 1, 

and observed process 

p{yt\yi:t-i,xi.,t-i,9) ^p{yt\xt,6) = gg{yt\xt), t > 1. 

For an overview of such models with references to a wide range of applications in Engineering, 
Economics, Natural Sciences, and other fields, see e.g. Doucet et al. (2001), Kiinsch (2001) or 



Cappe et al. (2005). 



We are interested in the recursive exploration of the sequence of posterior distributions 

7ro((?) =p(0), TTt{0,Xl..t)^p{0,Xl..t\yi:t), t>l, (1) 

as well as computing the model evidence p{yi:t) for model composition. Such a sequential analysis 
of state-space models under parameter uncertainty is of interest in many settings; a simple example 
is out-of-sample prediction, and related goodness-of-fit diagnostics based on prediction residuals, 
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which are popular for instance in Econometrics; see e.g. Section 4.3 of Kim et al. ( 1998 ) or Koop 



and Potter (2007). Furthermore, we shall see that recursive exploration up to time t = T may be 



computationally advantageous even in batch estimation scenarios, where a fixed observation record 
yi-T is available. 



1.2. State of the art 

Sequential Monte Carlo (SMC) methods are considered the state of the art for tackling this kind 
of problems. Their appeal lies in the efficient rc-use of samples across different times t, compared 
for example with MCMC methods which would typically have to be re-run for each time horizon. 
Additionally, convergence properties (with respect to the number of simulations) under mild as- 



sumptions are now well understood; see e.g. 


Del Moral and Guionnet 


(1999 


), Crisan and Doucet 


(2002 


1, Chopin 


(2004 


1, 


Oudjane and Rubenthaler (2005), Douc and Moulines (2008 


). See also 



Del Moral et al. (2006) for a recent overview of SMC methods. 

SMC methods are particularly (and rather unarguably) effective for exploring the simpler se- 
quence of posteriors, TTt{xt\9) = p{xf\yi.t, 0); compared to the general case the static parameters are 
treated as known and interest is focused on xt as opposed to the whole path XQ-t- This is typically 
called the filtering problem. The corresponding algorithms are known as particle filters (PFs) ; they 



are described in Section 2.1 in some detail. These algorithms evolve, weight and resample a popu- 
lation of Nx number of particles, x]'^'' , so that at each time t they are a properly weighted sample 
from 7Tt{xt\0). Recall that a particle system is called properly weighted if the weights associated 
with each sample are unbiased estimates of the Radon-Nikodym derivative between the target and 



the proposal distribution; see for example Section 1 of Fearnhead et al. (2010a I and references 
therein. A by-product of the PF output is an unbiased estimator of the likelihood increments and 
the marginal likelihood 



P{yi:t\9)^p{yi\e)l[p{ys 



yi:s-l, 



l<t<T. 



(2) 



s=2 



the variance of which increases linearly over time ( Cerou et al. 2011 ). 



Complementary to this setting is the iterated batch importance sampling (IBIS) algorithm of 



Chopin (2002) for the recursive exploration of the sequence of parameter posterior distributions. 



nt{9); the algorithm is outlined in Section 2.2 This is also an SMC algorithm which updates a 
population of Ne particles, O^'-^" , so that at each time t they are a properly weighted sample from 
Trt{9). The algorithm includes occasional MCMC steps for rejuvenating the current population of 
0-particles to prevent the number of distinct 0-particles from decreasing over time. Implementation 
of the algorithm requires the likelihood increments p{yt\yi:t-i, &) to be computable. This constrains 
the application of IBIS in state-space models since computing the increments involves integrating 
out the latent states. Notable exceptions are linear Gaussian state-space models and models where 
Xt takes values in a finite set. In such cases a Kalman filter and a Baum filter respectively can 
be associated to each 0-particle to evaluate efficiently the likelihood increments; see e.g. [Chopin 
( [2007t 



On the other hand, sequential inference for both parameters and latent states for a generic 
state-space model is a much harder problem, which, although very important in applications, 
is still rather unresolved; see for example Doucet et al. (2011), Andrieu et al. ( 2010[ ), Doucet 
et al. (2009) for recent discussions. The batch estimation problem of exploring irxid, xo;t) is a 



non-trivial MCMC problem on its own right, especially for large T. This is due to both high 
dependence between parameters and the latent process, which affects Gibbs sampling strategies 



( Papaspiliopoulos et al. 2007), and the difficulty in designing efficient simulation schemes for 



sampling from 7rT(xo:T|^)- To address these problems [Andrieu et al. (2010) developed a general 
theory of particle Markov chain Monte Carlo (PMCMC) algorithms, which are MCMC algorithms 
that use a particle filter of size as a proposal mechanism. Superficially, it appears that the 
algorithm replaces the intractable ^ by the unbiased estimator provided by the PF within an 
MCMC algorithm that samples from 7rr(6'). However, Andrieu et al. (2010) show that (a) as 



grows, the PMCMC algorithm behaves more and more like the theoretical MCMC algorithm which 
targets the intractable ttt{0); and (b) for any fixed value of N^, the PMCMC algorithm admits 
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ttt{9t xo;t) as a stationary distribution. The exactness (in terms of not perturbing the stationary 
distribution) follows from demonstrating that the PMCMC is an ordinary MCMC algorithm (with 
specific proposal distributions) on an expanded model which includes the PF as auxiliary variables; 
when = 1 this augmentation collapses to the more familiar scheme of imputing the latent states. 



1.3. Proposed algorithm 

SMC^ is a generic black box tool for performing sequential analysis of state-space models, which 
can be seen as a natural extension of both IBIS and PMCMC. To each of the Ng 6*— particles 
0™, we attach a PF which propagates x— particles; due to the nested filters we call it the 
SMC^ algorithm. Unlike the implementation of IBIS which carries an exact filter, in this case 
the PFs only produce unbiased estimates of the marginal likelihood. This ensures that the 9- 



particles are properly weighted for nt{9), in the spirit of the random weight PF of e.g. Fearnhead 



et al. (2010a). The connection with the auxiliary representation underlying PMCMC is pivotal for 



designing the MCMC rejuvenation steps, which are crucial for the success of IBIS. We obtain a 
sequential auxiliary Markov representation, and use it to formally demonstrate that our algorithm 
explores the sequence defined in ([!]). The case — oo corresponds to an (unrealisable) IBIS 
algorithm, whereas iV^, = 1 to an importance sampling scheme, the variance of which typically 
grows polynomially with t { Chopin[ |2004 1 . 

SMC^ is a sequential but not an on-line algorithm. The computational load increases with 
iterations due to the associated cost of the MCMC steps. Nevertheless, these steps typically occur 
at a decreasing rate (see Section 3.8 for details). The only on-line generic algorithm for sequential 
analysis of state-space models we are aware of is the self-organizing particle filter (SOPF) of 
Kitagawa (1998): this is PF applied to the extended state Xt = {xt,9), which never updates the 



^-component of particles, and typically diverges quickly over time (e.g. Doucet et al.[ 2009); see 
also Liu and West (2001) for a modification of SOPF which we discuss later. Thus, a genuinely 
on-line analysis, which would provide constant Monte Carlo error at a constant CPU cost, with 
respect to all the components of {xi;t,9) may well be an unattainable goal. This is unfortunate, 
but hardly surprising, given that the target ttj is of increasing dimension. For certain models with 
a specific structure (e.g the existence of sufficient statistics), an on-line algorithm may be obtained 
by extending SOPF so as to include MCMC updates of the ^-component, see|Gilks and Berzuini| 



( |2001[ ), |Fearnhead| ( |2002[ ), |Storvik| ( |2002[ ), and also the more recent work of |Carvalho et al.| ( |2010[ ), 
but numerical evidence seems to indicate these algorithms degenerate as well, albeit possibly at a 
slower rate; see e.g. Doucet et al. (20091. On the other hand, SMC^ is a generic approach which 



does not require such a specific structure. 

Even in batch estimation scenarios SMC^ may offer several advantages over PMCMC, in the 



same way that SMC approaches may be advantageous over MCMC methods (Neal 2001 Chopin 
2002| |Cappe et all |2004| |Del Moral et al-l |2006| | Jasra et al.j |2007|) 



Under certain conditions 



(which relate to the asymptotic normality of the maximizer of ([2)) SMC^ has the same complexity 
as PMCMC. Nevertheless, it calibrates automatically its tuning parameters, as for example and 



the proposal dist r ibutio ns for 9. (Note adaptive versions of PMCMC, see e.g. Silva et al. (2009) 
and [Peters et al. (2010) exist however.) Then, the first iterations of the SMC^ algorithm make it 
possible to quickly discard uninteresting parts of the sampling space, using only a small number 
of observations. Finally, the SMC^ algorithm provides an estimate of the evidence (marginal 
likelihood) of the model p{yi:T) as a direct by-product. 

We demonstrate the potential of the SMC^ on two classes of problems which involve multidi- 
mensional state processes and several parameters: volatility prediction for financial assets using 
Levy driven stochastic volatility models, and likelihood assessment of athletic records using time- 
varying extreme value distributions. A supplement to this article (available on the third author's 
web-page) contains further numerical investigations with the SMC^ and competing methods on 
more standard examples. 

Finally, it has been pointed to us that Fulop and Li (2011 ) have developed independently and 
concurrently an algorithm similar to SMC^. Distinctive features of our paper are the generality of 
the proposed approach, so that it may be used more or less automatically on complex examples 
(e.g. setting dynamically), and the formal results that establish the validity of the SMC^ 
algorithm, and its complexity. 
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1.4. Plan, notations 

The paper is organised as follows. Section [2] recalls the two basic ingredients of SMC'^: the PF 
and the IBIS. Section [s] introduces the SMC^ algorithm, provides its formal justification, discusses 
its complexity and the latitude in its implementation. Section [4] carries out a detailed simulation 
study which investigates the performance of SMC^ on particularly challenging models. Section [s] 
concludes. 

As above, we shall use extensively the concise colon notation for sets of random variables, e.g. 
xl'^'^ is a set of random variables a;", n — 1, . . . ^N^, is the union of the sets x]^'^'' , 

s — 1, . . . ,t, and so on. In the same vein, 1 : stands for the set {1, . . . , Nx}. Particle (resp. 
time) indices are always in superscript (resp. subscript). The letter p refers to probability densities 
defined by the model, e.g. p{0), p(jji:t\d), while ttj refers to the probability density targeted at time 
t by the algorithm, or the corresponding marginal density with respect to its arguments. 

2. Preliminaries 

2. 1 . Particle filters (PFs) 

We describe a particle filter that approximates recursively the sequence of filtering densities 
T^tixtld) = pixt\yi:t,(^), for a fixed parameter value 9. The formalism is chosen with view to 
integrating this algorithm into SMC^. We first give a pseudo-code version, and then we detail 
the notations. Any operation involving the superscript n must be understood as performed for 
n e 1 : N^, where is the total number of particles. 

Step 1: At iteration t ~ 1, 

(a) Sample ^ 9i,e(-)- 

(b) Compute and normalise weights 

Qi,0[xi) z]i=i^i,e(a;i) 

Step 2: At iteration t = 2 : T, 

(a) Sample the index a"_i ^ of the ancestor of particle n. 

(b) Sample x'l qt,ei-\xt!ri )■ 

(c) Compute and normalise weights 

, a-_, „, feix]^\xtri)g9{yt\x^) wt,eixtri\x2) 



qt,e{x^\x,!_-,') Efji wtA^Z'i , x\) 



In this algorithm, A^(M^/_l^g) stands for the multinomial distribution which assigns probability 



to outcome n G 1 



Nx, and (qt,e)i^i.rp stands for a sequence of conditional proposal 
distributions which depend on 9. A standard, albeit sub-optimal, choice is the prior, qi^g(xi) = 

fj,g{xi), qt^e{xt\xt-i) = fe{xt\xt-i) for t > 2, which leads to the simplification wt^eix^!^^^ ,x") 



geivt 




et al. 


1993 


et al. 


1999 



). We note in passing that Step (a) is equivalent to multinomial resampling (e.g. Gordon 



Other, more efficient schemes exist (Liu and Chen 1998 Kitagawa 1998 Carpenter 
but are not discussed in the paper for the sake of simplicity. 



At iteration i, the following quantity 
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is an unbiased estimator p{yt\yi:t-i, d)- More generally, it is a key feature of PFs that 



s=2 \n=l 



is also an unbiased estimator oi p{yi-t\9)] this is not a straightforward result, see Proposition 7.4.1 



Del Moral , ( ,20041 . We denote by Vi,e(a;i for t = 1, and A,d{x'^\t ^ <^i-.t-i) t > 2, the 
joint probability density of all the random variables generated during the course of the algorithm 
up to iteration t. Thus, the expectation of the random variable Zt{9, x\'.^'' , al'.^J'i) with respect to 
ipt,9 is exactly p{yi;t\9). 

2.2. Iterated batch i mportance sam pling (IBIS) 

The IBIS approach of |Chopin| p002] ) is an SMC algorithm for exploring a sequence of parameter 



posterior distributions 7rt(0) = p{9\yi;t)- AH the operations involving the particle index m must be 
understood as operations performed for all m G 1 : Ng, where Ng is the total number of 0-particles. 

Sample 6*™ from p{9) and set ^ 1. Then, at time t = 1 : T 

(a) Compute the incremental weights and their weighted average 

No 

Md"! = p{yt\yi:t^i,0"^), Lt = -j^ X g c.™^it(r'), 

with the convention p(?/i|yi:o, 0) — p{yi\9) for t ~ 1. 

(b) Update the importance weights, 

^ w'"u((6''"). (4) 

(c) If some degeneracy criterion is fulfilled, sample 9™" independently from the mixture distribution 

Ne 



■ 1 m=l 



m—1 



Finally, replace the current weighted particle system, by the set of new, unweighted particles: 



Chopin| ( |2004[ ) shows that 

is a consistent and asymptotically (as Ng — ^ oo) normal estimator of the expectations 

E[^{9)\y,..t] = J ^{9)p{9\y,..t)d9, 

for all appropriately integrable ip. In addition, each Lf, computed in Step (a), is a consistent and 
asymptotically normal estimator of the likelihood p{yt\yi:t-i)- 

Step (c) is usually decomposed into a resampling and a mutation step. In the above algorithm 
the former is done with the multinomial distribution, where particles are selected with probability 



proportional to w™. As mentioned in Section 2.1 other resampling schemes may be used instead 



The move step is achieved through a Markov kernel Kt which leaves p{9\yi;t) invariant. In our 
examples Kt will be a Metropolis-Hastings kernel. A significant advantage of IBIS is that the 
population of 6'-particles can be used to learn features of the target distribution, e.g by computing 

^ Ne . Ng 



Z^m=l ^ m=l Z^m=l ^ 
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New particles can be proposed according to a Gaussian random walk 9'^\9™ ~ N{9™',cT,), where 
c is a tuning constant for achieving optimal scaling of the Metropolis-Hastings algorithm, or in- 
dependently 6™ ~ iV(/t, E) as suggested in Chopin (2002). A standard degeneracy criterion is 
ESS < 'jNg, for 7 € (0, 1), where ESS stands for "effective sample size" and is computed as 



ESS = 



(5) 



Theory and practical guidance on the use of this criterion are provided in Sections 3.7 and |4] 
respectively. 

In the context of state-space models IBIS is a theoretical algorithm since the likelihood in- 
crements p{yt\yi:t-i,d) (used both in Step 2, and implicitly in the MCMC kernel) are typically 
intractable. Nevertheless, coupling IBIS with PFs yields a working algorithm as we show in the 
following section. 



3. Sequential parameter and state estimation: the SIVIC^ algorithm 

SMC^ is a natural amalgamation of IBIS and PP. We first provide the algorithm, we then demon- 
strate its validity and we close the section by considering various possibilities in its implementation. 
Again, all the operations involving the index m must be understood as operations performed for 
all TO e 1 : Ng. 



Sample 6*™ from p{e) and set ^ 1. Then, at time i = 1, . . . , T, 



(a) For each particle 6*™, perform iteration t of the PP described in Section 2.1 lit — 1^ sample 



independently a;}'^°""' from V'i,e"», and compute 



n—1 

It r > 1, sample Ixj ,a^_i I irom ipt,e'^ conditional on [Xi.^_{ jO,i-t-2 )' ^^"^ com- 
pute 

n=l 

(b) Update the importance weights, 

^"^c^™p(2/t|yi.t_i,0"). (6) 

(c) If some degeneracy criterion is fulfilled, sample (^9"^ independently from the 

mixture distribution 

Z^m=l ^ m=l 



where Kt is a PMCMC kernel described in Section [3T2| Finally, replace the current weighted 
particle system by the set of new unweighted particles: 



The degeneracy criterion in Step (c) will typically be the same as for IBIS, i.e., when the ESS drops 
below a threshold, where the ESS is computed as in ^ and the w^'s are now obtained in 
We study the stability and the computational cost of the algorithm when applying this criterion 
in Section IXTl 
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3. 1 . Formal justification of SMC^ 

A proper formalisation of the successive importance sampling steps performed by the SMC^ algo- 
rithm requires extending the sampling space, in order to include all the random variables generated 
by the algorithm. 

At time t ~ 1, the algorithm generates variables 6*™ from the prior p{9), and for each 9™, the 
algorithm generates vectors x\' of particles, from ipi gm{xi'^''). Thus, the sampling space is 
Q X , and the actual "particles" of the algorithm are Ng independent and identically distributed 
copies of the random variable (6*, a;}'^""), with density: 



ra=l 



Then, these particles are assigned importance weights corresponding to the incremental weight 
function Ziie,x];-^-) = TV"! J2n=i WiA^^l). This means that, at iteration 1, the target distribu- 
tion of the algorithm should be defined as: 



p{vi) 



where the normalising constant p{yi) is easily deduced from the property that Zi{9,x\^'') is an 
unbiased estimator of p{yi\9). To understand the properties of tti, simple manipulations suffice. 
Substituting i(;i^e(a;"), 7/'i.e(a;J'^^) and Zi{0,x\'^'') with their respective expressions. 



1 (0) f 

^ n=l P^y^) [i=l,i5^„ 

and noting that, for the triplet {9,xi,yi) of random variables, 

p{9)fie{xi)ge{yi\xi) ^p{e,xi,yi) = p{yi)p{9\yi)p{xi\yi,9) 
one finally gets that: 

Mo,xr^) = ^Ep(-?i^1'^)( ft 

The following two properties of tti are easily deduced from this expression. First, the marginal 
distribution of 9 is p{9\yi). Thus, at iteration 1 the algorithm is properly weighted for any N^- Sec- 
ond, conditional on 0, tti assigns to the vector xl'^'' a mixture distribution which with probability 
1/Nrc, gives to particle n the filtering distribution p{xi\yi,9), and to all the remaining particles the 
proposal distribution qi^. The notation reflects these properties by denoting the target distribution 
of SMC^ by TTi, since it admits the distributions defined in ([T]) as marginals. 

By a simple induction, one sees that the target density ttj at iteration t > 2 should be defined 

as: 

n,(9,x\:^^^,a]:^^^p(9)M-]f^,-^^i) - '^ '^ 'f"*'^^ (7) 

p[yi:t) 

where Zt{9,x\'.^'' ,a\'.^j!:i) defined in ([s]), that is, it should be proportional to the sampling 
density of all random variables generated so far, times the product of the successive incremen- 
tal weights. Again, the normalising constant p{yi-t) in ([t]) is easily deduced from the fact that 
, xl'.^'' , al\^J:i) is an unbiased estimator of p{yi;t\9) . The following Proposition gives an alter- 
native expression for tt^. 
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Proposition 1. The probability density nt may be written as: 



(8) 



71—1 



t-i 



n '^mK) n n 



w: 



s=2 i=l 



where x".j and h" are deterministic functions of x\.f^ 



and aj;^i defined as follows: hj 



(h"(l), . . . , h"(t)) denote the index history of x"^ , that is, h"(i) = n, and h"(s) = a^* recur- 



sively, for s = t — 1, 



i.e. y^ltis) = X 



1. anc? x".f 
/or s = 1, . . 



= (X?:t(l),. 



, x"((i)) denote the state trajectory of particle 



A proof is given in Appendix A. We use a bold notation to stress out that the quantities x"^ 
I h" are quite different from particle arrays such as e.g. 2;^^°' ■ ^iit ^'^d h" provide the complete 
ealogy of the particle with label n at time t, while x^.^ simply concatenates the successive 
tide arrays x^'^'' , and contains no such genealogical information. 

It follows immediately from expression ([S]) that the marginal distribution of tt^ with respect 



to 6 is p(0|yi:t). Conditional on 6 the remaining random variables, x{.^ ^ and a{.f.^i, have a mix- 
ture distribution, according to which, with probability l/N^ the state trajectory x"^ is generated 

according to p{xi;t\9,yi;t), the ancestor variables corresponding to this trajectory, a^' '^'^^ are uni- 
formly distributed within 1 : , and all the other random variables are generated from the particle 
filter proposal distribution, ijjt,6- Therefore, Proposition[l]establishes a sequence of auxiliary distri- 
butions TTt on increasing dimensions, whose marginals include the posterior distributions of interest 
defined in ([T]). The SMC^ algorithm targets this sequence using SMC techniques. 



3.2. The MCMC rejuvenation step 

To formally describe this step performed at some iteration t, we must work, as in the previous 



section, on the extended set of variables {9, x^: 



t 



^l:t- 



i). The algorithm is described below; if the 



proposed move is accepted, the set of variables is replaced by the proposed one, otherwise it is left 
unchanged. The algorithm is based on some proposal kernel T{9,d0) in the 0— dimension, which 
admits probability density T{9, 9). (The proposal kernel for 9, T{9, •), may be chosen as described 



in Section 2.2 



(a) Sample 9 from proposal kernel, 9 ^ T{9, d9) 

(b) Run a new PF for 9: sample independently (J}: 

(c) Accept the move with probability 



t '"l:t-ly 



from 1p^ g, and compute Zt{9, x\. 



1 A 



p{9)Z,{0,S:\-^%~ai-!^-i)n9,9) 
p{9)U9,xl^%a\.;^f,)T{9,9) 



'lit-li- 



It directly follows from ([7| that this algorithm defines a standard Hastings-Metropolis kernel 
with proposal distribution 



90 1 



~l:N^ ~1:N, 



and admits as invariant distribution the extended distribution 7rj( 



\). In the broad 



PMCMC framework, this scheme corresponds to the so-called particle Metropolis-Hastings algo- 



rithm (see Andrieu et al. 2010 1 . It is worth pointing out an interesting digression from the PMCMC 
framework. The Markov mutation kernel has to be invariant with respect to TTt, but it does not 
necessarily need to produce an ergodic Markov chain, since consistency of Monte Carlo estimates 
is achieved by averaging across many particles and not within a path of a single particle. Hence, 
we can also attempt lower dimensional updates, e.g using a Hastings-within-Gibbs algorithm. The 
advantage of such moves is that they might lead to higher acceptance rates for the same step size 
in the ^-dimension. However, we do not pursue this point further in this article. 
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3.3. PMCMC's invariant distribution, state inference 

From ([8|, one may rewrite ttj as the marginal distribution of {9,x\'.^^ ,a\'.^j^i) with respect to an 
extended distribution that would include a uniformly distributed particle index n* G 1 : N^: 

P{^l.t\e,v,.,){ n 91..^)^^ n w'f-^.qsA^Wxt-,')) . (9) 

! = l s=2 i=l 



Andrieu et al. (2010) formalise PMCMC algorithms as MCMC algorithms that leaves tt^ invari- 
ant, whereas in the previous section we justified our PMCMC update as a MCMC step leaving ttj 
invariant. This distinction is a mere technicality in the PMCMC context, but it becomes important 
in the sequential context. SMC^ is best understood as an algorithm targetting the sequence (TTt): 
defining importance sampling steps between successive versions of tt^ seems cumbersome, as the 
interpretation of n* at time t does not carry over to iteration t -\- 1. This distinction also relates 



to the concept of Rao-Blackwellised (marginalised) particle filters (Doucet et al. 2000): since TTi is 
a marginal distribution with respect to tt^, targetting tti rather than ttj leads to more efficient (in 
terms of Monte Carlo variance) SMC algorithms. 

The interplay between ttj and tt^ is exploited below and in the following sections in order to fully 
realize the implementation potential of SMC^. As a first example, direct inspection of Q reveals 
that the conditional distribution of n* , given 9, x^.^'' and aj:^]^, is M.{Wl'^''), the multinomial 
distribution that assigns probability W^g to outcome n, n € 1 : N^. Therefore, weighted samples 
from p{6, xi;t\yi:t) may be obtained at iteration t as follows: 

(a) For TO = 1, . . . , A^e, draw index n*{m) from M.{Wl'^). 

(b) Return the weighted sample 

where x".'™ was defined in Proposition [T] 

This temporarily extended particle system can be used in the standard way to make inferences about 
Xt (filtering), yt+i (prediction) or even Xi;t (smoothing), under parameter uncertainty. Smoothing 
requires to store all the state variables xl'.^""^'^" , which is expensive, but filtering and prediction 
may be performed while storing only the most recent state variables, x]'^'^'^''^" . We discuss more 
thoroughy the memory cost of SMC^, and explain how smoothing may still be carried out at certain 



times, without storing the complete trajectories, in Section 3.7 

The phrase temporarily extended in the previous paragraph refers to our discussion on the 
difference between tt^ and tt^. By extending the particles with a n* component, one temporarily 
change the target distribution, from tt* to tt^. To propagate to time t + 1, one must revert back 
to TTt, by simply marginaling out the particle index n*. We note however that, before reverting to 
TTt, one has the liberty to apply MCMC updates with respect to ttJ". For instance, one may update 
the 0— component of each particle according to the full conditional distribution of 9 with respect 
to to TTf, that is, p(0|x".j, ?/i.j). Of course, this possibility is interesting mostly for those models 
such that p{9\x.".^,yi-t) is tractable. And, again, this operation may be performed only if all the 
state variables are available in memory. 



3. 4. Reusing all t tie x- particles 

The previous section describes an algorithm for obtaining a particle sample (w™, 0™, x"./'"''''")mgi:Arg 
that targets p{9,xi;t\yi:t)- One may use this sample to compute, for any test function h{9,xi;t), 
an estimator of the expectation of h with respect to the target p{9,xi;t\yi:t)- 

y u:"'h{9"\^tt'^'>^"')- 

ENe m ^ \ ' i.t I 

m,=l m=l 
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As in Andrieu et al. (2010 Section 4.6), we may deduce from this expression a Rao-Blackwellised 
estimator, by marginalising out n*, and re-using all the a;-particles: 



m— 1 



The variance reduction obtained by this Rao-Blackwellisation scheme should depend on the vari- 
x"'™) with respect to n. For a fixed m, the components x".'™(s) of the tra- 



ability of /i(6l" 

jectories x";™ are diverse when s is close to t, and degenerate when s is small. Thus, this 
Rao-Blackwellisation scheme should be more efficient when h depends mostly on recent state val- 
ues, e.g. h{9,xi;t) — h(xt), and less efficient when h depends mostly on early state values, e.g. 
h{9,xi;t) = h{xi). 



3.5. Evidence 

The evidence of the data obtained up to time t may be decomposed using the chain rule: 



P{yi:t) = J|p(?/s|yi:s-l). 



s=l 

The IBIS algorithm delivers the weighted averages Lg, for each s = 1, . . . , i, which are Monte Carlo 



estimates of the corresponding factors in the product; see Section 2.2 Thus, it provides an estimate 
of the evidence by multiplying these terms. This can also be achieved via the SMC^ algorithm in 
a similar manner: 

where piytlvi-.t-ij ^'") is given in the definition of the algorithm. It is therefore possible to estimate 
the evidence of the model, at each iteration t, at practically no extra cost. 



3. 6. Automatic calibration ofNj. 

The plain vanilla SMC^ algorithm assumes that stays constant during the complete run. This 
poses two practical difficulties. First, choosing a moderate value of Nj. that leads to a good per- 
formance (in terms of small Monte Carlo error) is typically difficult, and may require tedious pilot 
runs. As any tuning parameter, it would be nice to design a strategy that determines automatically 
a reasonable value of N^. Second, Andrieu et al.| (2010l show that, in order to obtain reasonable 
acceptance rates for a particle Metropolis- Hastings step, one should take = 0{t), where t is the 
number of data-points currently considered. In the SMC^ context, this means that it may make 
sense to use a small value for for the early iterations, and then to increase it regularly. Finally, 
when the variance of the PF estimates depends on 9, it might be interesting to allow to change 
with 9 as well. 

The SMC^ framework provides more scope for such adaptation compared to PMCMC. In this 
section we describe two possibilities, which relate to the two main particle MCMC methods, particle 
marginal Metropolis-Hastings and particle Gibbs. The former generates the auxiliary variables 
independently of the current particle system whereas the latter does it conditionally on the current 
system. For this reason the latter yields a new system without changing the weights, which is a 
nice feature, but it requires storing particle histories, which is memory inefficient; see Section [3.7| 
for a more thorough discussion of the memory cost of SMC^. 

The schemes for increasing can be integrated into the main SMC^ algorithm along with 
rules for automatic calibration. We propose the following simple strategy. We start with a small 
value for , we monitor the acceptance rate of the PMCMC step and when this rate falls below 
a given threshold, we trigger the "changing N^" step; for example we multiply by 2. 
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3.6.1. Exchange importance sampling step 

Our first suggestion involves a particle exchange. At iteration t, the algorithm has generated so far 
the random variables , xl'.^'"^'^^ and a\'.^J:{^'^^ and the target distribution is TTt{6, xjif^"' , aj:^^). 
At this stage, one may extend the sampling space, by generating for each particle 0™, new PFs 
of size Nx, by simply sampling independently, for each m, the random variables Xj^.^ jCl^.^j^^ 
from ipt,e'^ ■ Thus, the extended target distribution is: 

Mv.x^.^ ^ ,a•^.^^^)■^|^tAXl■.t ,ai:t-i)- (10) 

In order to swap the a;— particles and the i— particles, we use the generalised importance sampling 
strategy of Del Moral et al. ( 2006 1 , which is based on an artificial backward kernel. Using ([7| , we 



compute the incremental weights 

X 



(11) 



where is a backward kernel density. One then may drop the "old" particles , aj:^]^) in 

order to obtain a new particle system, based on particles (0, ij:^^ , aj:^j^) targetting tt^, but with 
Nx, X— particles. 

This importance sampling operation is valid under mild assumptions for the backward kernel 



Lt] namely that the support of the denominator of ( 11 ) is included in the support of its numerator. 
One easily deduces from Proposition 1 of Del Moral et al. ( 2006 1 and ([t]) that the optimal kernel 



(in terms of minimising the variance of the weights) is 



Piyi:t\o) 



This function is intractable, because of the denominator p{yi.t\9), but it suggests the following 
simple approximation: Lt should be set to ipt,e, so as to cancel the second ratio, which leads to the 
very simple incremental weight function: 



,,exch n 1:N^ „l-N^ -l:Wx xl:N. 



Zjt[t^,Xi.f_ ,ai:t_lj 

By default, one may implement this exchange step for all the particles 0^-^", and multiply 
consequently each particle weight oj™ with the ratio above. However, it is possible to apply this 
step to only a subset of particles, either selected randomly or according to some deterministic 
criterion based on 6. (In that case, only the weights of the selected particles should be updated.) 
Similarly, one could update certain particles according to a Hastings-Metropolis step, where the 
exchange operation is proposed, and accepted with probabilty the minimum of 1 and the ratio 
above. 

In both cases, one effectively targets a mixture of iTt distributions corresponding to different 
values of iV^;. This does not pose any formal difficutly, because these distributions admit the same 
marginal distributions with respect to the components of interest {6, and xi-t if the target distri- 



bution is extended as described in Section 3.3), and because the successive importance sampling 
steps (such as either the exchange step above, or Step (b) in the SMC^ Algorithm) correspond to 
ratios of densities that are known up to a constant that does not depend on iV^,. 

Of course, in practice, propagating PF of varying size is a bit more cumbersome to imple- 
ment, but it may show useful in particular applications, where for instance the computational cost 
of sampling a new state Xt+i, conditional on xt, varies strongly according to 0. 
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3.6.2. Conditional SMC step 

Whereas the exchange steps associates with the target irt, and the particle Metropolis-Hastings 
algorithm, our second suggestion relates to the target tt^, and to the particle Gibbs algorithm. 
First, one extends the target distribution, from tt^ to tt^, by sampling a particle index n* , as 



explained in Section 3.3 Then one may apply a conditional SMC step (Andrieu et al. 20101, to 



generate a new particle filter of size N^, Xi'. 



l:t- 



fj^, but conditional on one trajectory being equal 



to x" j. This amounts to sampling the conditional distribution defined by the two factors in curly 
brackets in (|9]), which can also be conveniently rewritten as 



7V^<(n^0,^j:f^al:;^) 



Pi0,K.t\yi:t) 



We refrain from calling this operation a Gibbs step, because it changes the target distribution 
(and in particular its dimension), from 7rt(6', xjif^^" , ajif^"') to 7rt(6', xjif , ajif'"'). A better formali- 



sation is again in terms of an importance sampling step involving a backward kernel (Del Moral 
et al.[[2006 ^, from the proposal distribution, the current target distribution iTt times the conditional 
distribution of the newly generated variables: 



towards target distribution 



1:N, 
lit 



t 



p{e,^Uyi:t) 



~1:N^ ~1:N^ \ 
Xl:t i"l:t-i;7 



where Lt is again an arbitrary backward kernel, whose argument, denoted by a dot, is all the 
variables in (xjif'"' , aj:^]^), except the variables corresponding to trajectory x"^. It is easy to see 
that the optimal backward kernel (applying again Proposition 1 of Del Moral et al.||2006 1 is such 
that the importance sampling ratio equals one. The main drawback of this approach is th at it 
requires to store all the state variables x\'. 



; see our dicussion of memory cost in Section 3.7 



3.7. Complexity 
3.7.1. Memory cost 

In full generality the SMC^ algorithm is memory-intensive: up to iteration t, 0{tNgNx) variables 
have been generated and potentially have to be carried forward to the next iteration. We explain 
now how this cost can be reduced to 0{NqN.j.) with little loss of generality. 

Only the variables XjlfJ""^'^" are necessary to carry out Step (a) of the algorithm, while all other 
state variables be discarded. Additionally, when Step (c) is carried out as described 

in Section 3.2 Zt is the only additional necessary statistic of the particle histories. Thus, the 
typical implementation of SMC^ for sequential parameter estimation, filtering and prediction has 
an 0{NgNx) memory cost. The memory cost of the exchange step is also 0{NxNg); more precisely, 
it is 0{Nj.Ng), where is the new size of the PF's. A nice property of this exchange step is that 

it temporarily regenerates complete trajectories xjif^""™, sequentially for m = 1,...,M. Thus, 
besides augmenting Nx dynamically, the exchange step can also be used to to carry out operations 
involving complete trajectories at certain pre-defined times, while maintaining a 0{NgNx) overall 
cost. Such operations include inference with respect to 7r((0, xi:*), updating 9 with respect to the 
full conditional p{6\xi:t,yi:t), as explained in Section [3.3[ or even the conditional SMC update 
descrided in Section fS. 6. 21 



3.7.2. Stability and computational cost 

Step (c), which requires re-estimating the likelihood, is the most computationally expensive compo- 
nent of SMC^. When this operation is performed at time t, it incurs an 0{tNgNx) computational 
cost. Therefore, to study the computational cost of SMC^ we need to investigate the rate at which 
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ESS drops below a given threshold. This question directly relates to the stability of the filter, and 
we will work as in Section 3.1 of Chopin (20041 to answer it. Our approach is based on certain 



simplifying assumptions, regularity conditions and a recent result of Cerou et al. (20111 which all 



lead to Proposition [2j the assumptions are discussed in some detail in Appendix B. 

In general, ESS/iVe < 7, for ESS given in is a standard degeneracy criterion of sequential 
importance sampling due to the fact that the limit of ESS /Ng as Ng ^ 00 is equal to the inverse of 
the second moment of the importance sampling weights (normalized to have mean 1). This limiting 
quantity, which we will generically denote by £, is also often called effective sample size since it 
can be interpreted as an equivalent number of independent samples from the target distribution 
(see Section 2.5.3 of Liu 2008 for details). The first simplification in our analysis is to study the 
properties of £, rather than its finite sample estimator ESS/Ng, and consider an algorithm which 
resamples whenever f < 7. 

Consider now the specific context of SMC^. Let t be a resampling time at which equally 
weighted, independent particles have been obtained, and t + p, p > 0, a future time such that no 
resampling has happened since t. The marginal distribution of the resampled particles at time t is 
only approximately tt^ due to the burn-in period of the Markov chains which are used to generate 
them. The second simplifying assumption in our analysis is that this marginal distribution is 
precisely nt. Under this assumption, the particles at time t + p are generated according to the 
distribution Tfj j+p. 



l:Afx 



^l:t+p-l 



and the expected value of the weights uJt+p obtained from ^ is p{yi:t) / Pivi-.t+p) ■ Therefore, the 
normalized weights are given by 



P{Vl:t 

p{yi:t+p) fJi 



Y[p{y^ 



't+t\yi:t+i-l,t 



A Zt+p\t( 



1 ■^l■.t+p^ "-l-.t+p-lJ 



P{yt+l:t+p\yi:t) 



and the inverse of the second moment of the normalized weights in SMC^ and IBIS is given by 



E^" - I E- 



P{yt+l:t+p\yi:t)'^ 



^t^t+p - \ ^p{e\vi:t) 



p{0\yi:t+pf 
P{e\yi:t? 



The previous development leads to the following Proposition which is proved in Appendix B. 

Proposition 2. (a) Under Assumptions (Hla) and (Hlh) in Appendix B, there exists a 
constant rj > such that for any p, if > rjp, 



(12) 



(b) Under Assumptions (H2a)-(H2d) in Appendix B, for any 7 > there exist T,r] > and 
to < CO, such that for t > to, 



^tl+p > 7, for p = [rt] , N,^ = [77*] 



The implication of this Proposition is the following: under the assumptions in Appendix B 
and the assumption that the resampling step produces samples from the target distribution, the 
resample steps should be triggered at times [t'^] , k > 1, to ensure that the weight degeneracy 
between two successive resampling step stays bounded in the run of the algorithm; at these times 
Nx should be adjusted to = [ryr*^] ; thus, the cost of each successive importance sampling step 
is 0{NgT^), until the next resampling step; a simple calculation shows that the cumulative compu- 
tational cost of the algorithm up to some iteration t is then 0{Ngt^). This is to be contrasted with 
a computational cost 0{Ngt) for IBIS under a similar set of assumptions. The assumptions which 
lead to this result are restrictive but they are typical of the state of the art for obtaining results 
about the stability of this type of sequential algorithms; see Appendix B for further discussion. 
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4. Numerical illustrations 

An initial study which iUustrates SMC^ in a range of examples of moderate difficulty is available 
from the second author's web-page, see http://sites.google.com/site/pierrejacob/, as supplemen- 
tary material. In that study, SMC^ was shown to typically outperform competing algorithms, 
whether in sequential scenarios (where datapoints are obtained sequentially) or in batch scenarios 
(where the only distribution of interest is p{0,xi:T\yi:T) for some fixed time horizon T). For in- 
stance, in the former case, SMC^ was shown to provide smaller Monte Carlo errors than the SOPF 
at a given CPU cost. In the latter case, SMC^ was shown to compare favourably to an adaptive 



version of the marginal PMCMC algorithm proposed by Peters et al. (2010). 

In this paper, our objective instead is to take a hammer to SMC^, that is, to evaluate its 
performance on models that are regarded as particularly challenging, even for batch estimation 
purposes. In addition, we treat SMC^ as much as possible as a black box: the number of 



x-particles is augmented dynamically (using the exchange step, see Section 3.6.1), as explained in 



Section ^76) the move steps are calibrated using the current particles, as described at the end of 



Section 2.2 and so on. The only model-dependent inputs are (a) a procedure for sampling from the 
Markov transition of the model, fg{xt+i\xt); (b) a procedure for pointwise evaluation the likehhood 
9eiyt\xt)', and (c) a prior distribution on the parameters. This means that the proposal qt.e is set 
to the default choice fg{xt+i\xt)- This also means that we are able to treat models such that the 
density fg{xt+i\xt) cannot be computed, even if it may be sampled from; this is the case in the 
first application we consider. 

A generic SMC^ software package written in Python and C by the second author is available 
at |http://code.google.com/p/py-smc2/| 



4. 1 . Sequential prediction of asset price volatility 

SMC^ is particularly well suited to tackle several of the challenges that arise in the probabilistic 
modelling of financial time series: prediction is of central importance; risk management requires 
accounting for parameter and model uncertainty; non-linear models are necessary to capture the 
features in the data; the length of typical time series is large when modelling medium/low frequency 
data and vast when considering high frequency observations. 

We illustrate some of these possibilities in the context of prediction of daily volatility of asset 
prices. There is a vast literature on stochastic volatility (SV) models; we simply refer to the 
excellent exposition in Barndorff-Nielsen and Shephard (2002) for references, perspectives and 
second-order properties. The generic framework for daily volatility is as follows. Let st be the 
value of a given financial asset (e.g a stock price or an exchange rate) on the t-th day, and yt = 
10^/^ log(s(/st_i) be the so-called log-returns (the scaling is done for numerical convenience). The 
SV model specifies a state-space model with observation equation: 



1/2 

yt = + Pvt + et ,t>l 



(13) 



where the et is a sequence of independent errors which are assumed to be standard Gaussian. The 
process is known as the actual volatility and it is treated as a stationary stochastic process. This 
implies that log-returns are stationary with mixed Gaussian marginal distribution. The coefficient 
/3 has both a financial interpretation, as a risk premium for excess volatility, and a statistical one, 
since for /3 7^ the marginal density of log-returns is skewed. 



We consider the class of Levy driven SV models which were introduced in Barndorff-Nielsen and 



Shephard (2001 ) and have been intensively studied in the last decade from both the mathematical 
finance and the statistical community. This family of models is specified via a continuous-time 
model for the joint evolution of log-price and spot (instantaneous) volatility, which are driven by 
Brownian motion and Levy process respectively. The actual volatility is the integral of the spot 
volatility over daily intervals, and the continuous-time model translates into a state-space model 
for yt and vt as we show below. Details can be found in Sections 2 (for the continuous-time 
specification) and 5 (for the state-space representation) of the original article. Likelihood-based 
inference for this class of models is recognized as a very challenging problem, and it has been 



undertaken among others in Roberts et al. (2004); GrifRn and Steel (2006) and most recently in 



Andrieu et al. (2010) using PMCMC. On the other hand, Barndorff-Nielsen and Shephard (2002) 
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develop quasi-likelihood methods using the Kalman filter based on an approximate state-space 
formulation suggested by the second-order properties of the {yt,vt) process. 

Here we focus on models where the background driving Levy process is expressed in terms of 
a finite rate Poisson process and consider multi-factor specifications of such models which include 
leverage. This choice allows the exact simulation of the actual volatility process, and permits direct 
comparisons to the numerical results in Sections 4 of|Roberts et al. (20041, 3.2 of Barndorff-Nielseiiil 
and Shephard (2002) and 6 of [Griffin and Steel (20061. Additionally, this case is representative 



of a system which can be very easily simulated forwards whereas computation of its transition 
density is considerably involved (see ( 14 ) below) . The specification for the one-factor model is as 
follows. We parametrize the latent process as in BarndorfF-Nielsen and Shephard (2002) in terms 
of (^,a;^,A) where ^ and are the stationary mean and variance of the spot volatility process, 
and A the exponential rate of decay of its autocorrelation function. The second-order properties 



of Vt can be expressed as functions of these parameters, see Section 2.2 of BarndorfF-Nielsen and 
Shephard ( |2002[ ). The state dynamics for the actual volatility are as follows: 



k ~ Poi (ACV^') , ci:fc U(t, t + l), ev.k Exp {^/uj^) , 



zt+i = e 



_A(t+l-Cj) 



Vt+1 



zt - Zt+1 + ^ ' 



(14) 



Xt+i = [vt+i,zt+i) 



In this representation, Zt is the discretely-sampled spot volatility process, and the Markovian 
representation of the state process involves the pair {vt,Zt)- The random variables (fc, ci:fc, ei:fe) 
are generated independently for each time period, and 1 : fc is understood as the empty set when 
fc = 0. These system dynamics imply a r(^^/aj^, ^/w^) as stationary distribution for Zf . Therefore, 
we take this to be the initial distribution for zq. 

We applied the algorithm to a synthetic data set of length T = 1, 000 (Figure [ija)) simulated 
with the values = 0, /3 = 0, ^ = 0.5, = 0.0625, A = 0.01 which were used also in the 
simulation study of Barndorff-Nielsen and Shephard (2002). We launched 5 independent runs 
using Ng = 1,000, a ESS threshold set at 50%, and the independent Hastings-Metropolis scheme 
described in Section |2.2| The number Nj. was set initially to 100, and increased whenever the 
acceptance rate went below 20% (Figure [IJb)-(c)). Figure[T]jd)-(e) shows estimates of the posterior 
marginal distribution of some parameters. Note the impact the large jump in the volatility has on 
Nx:, which is systematically (across runs) increased around time 400, and the posterior distribution 
of the parameters of the volatility process, see Figure [T]jf). 

It is interesting to compare the numerical performance of SMC^ to that of the SOPF and Liu 



and West (2001 )'s particle filter (referred to as L&W in the following) for this model and data, and 
for a comparable CPU budget. The SOPF, if run with N = 10^ particles, collapses to one single 
particle at about t = 700 and is thus completely unusable in this context. L&W is a version of 
SOPF where the ^-components of the particles are diversified using a Gaussian move that leaves the 
first two empirical moments of the particle sample unchanged. This move unfortunately introduces 
a bias which is hard to quantity. We implemented L&W with N = 2 x 10^ (x, 6')-particles and 
we set the smoothing parameter h to 10~^; see the Supplement for results with various values 
of h. This number of particles was to chosen to make the computing time of SMC^ and L&W 
comparable, see Figure |2ja). Unsurprisingly, L&W runs are very consistent in terms of computing 
times, whereas those of SMC^ are more variable, mainly because the number of x-particles does 
not reach the same value across the runs and the number of resample-move steps varies. Each of 
these runs took between 1.5 and 7 hours using a simple Python script and only one processing unit 
of a 2008 desktop computer (equipped with an Intel Core 2 Duo E8400). Note that, given that 
these methods could easily be parallelized, the computational cost can be greatly reduced; a 100 x 
speed-up is plausible using appropriate hardware. 

Our results suggest that the bias in L&W is significant. Figure [2]jb) shows the posterior 
distribution of ^, the mean of volatility, at time t — 500, which is about 100 time steps after the 
large jump in volatility at time t = 407. The results for both algorithms are compared to those from 



a long PMCMC run (implemented as in [Peters et al. 2010 and detailed in the Supplement) with 



500 and 10^ iterations. Figure [2|c) reports on the estimation of the log evidence logp(yi: 
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Fig. 1. Single-factor stochastic volatility model, synthetic dataset. (a) Squared observations vs time, (b)-(f) 
Results obtained from 5 repeated runs; (b) acceptance rate; (c) vs time; (d) to (f) overlaid kernel density 
estimators of the posterior distribution of /i, p, ^ at different times t = 250, 500, 1000, the vertical dashed line 
indicates the true value and solid red lines the prior density. 
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for each algorithm, plotting the estimated log evidence of each run minus the mean of the log 
evidence of the 5 SMC^ runs. We see that the log evidence estimated using L&W is systematically 
biased, positively or negatively depending on the time steps, with a large discontinuity at time 
t = 407, which is due to underestimation of the tails of the predictive distribution. 
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Fig. 2. Single-factor stochastic volatility model, synthetic dataset, comparison between methods, (a) 
Computing time of 5 independent runs of L&W (left) and SIVIC^ (right) in seconds, against time, (b) Estimation 
of the posterior marginal distribution of mean volatility, (c) Estimation of the log evidence, the curves 
represent the estimated evidence of each run minus the mean across 5 runs of the log evidence estimated 
using SMC^ 



We now consider models of different complexity for the S&P 500 index. The data set is made 
of 753 observations from January 3rd 2005 to December 31st 2007 and it is shown on Figure [sf^a) . 
We first consider a two-factor model, according to which the actual volatility is a sum of two inde- 
pendent components each of which follows a Levy driven model. Previous research indicates that a 
two-factor model is sufficiently flexible, whereas more factors do not add significantly when consid- 
ering daily data, see for example Barndorff-Nielsen and Shephard ( 2002 1 ; Griffin and Steel ( 2006 1 
for Levy driven models and Chernov et al. ( 2003 ) for diffusion-driven SV models. We consider one 
component which describes long-term movements in the volatility, with memory parameter Ai, and 
another which captures short-term variation, with parameter A2 >> Ai. The second component 
allows more freedom in modelling the tails of the distribution of log-returns. The contribution of 



1 8 Chopin et al. 













4 


Model: 

— Multifactor without leverage 

— Multifactor with leverage 






1 




















1 


2 


3 


30 4 


50 


s 


7 






(a) 



(b) 



(c) 



Fig. 3. (a): the squares of the S&P 500 data from 03/01/2005 to 21/12/2007. (b): acceptance rates of the 
resample-move step for the full model over two independent runs, (c); log-evidence comparison between 
models (relative to the one-factor model). 



the slowly mixing process to the overall mean and variance of the spot volatility is controlled by the 
parameter w S (0,1). Thus, for this model xt = (wi,t, 2i,t, f2,t, 2^2, t) with ut = + t;2.t, where each 
pair (vi^tjZi^t) evolves according to (14 1 with parameters (u;^^, w^o;^, A^) with wi = 'w,W2 — i — w. 
The system errors are generated by independent sets of variables {ki,Ci^i;k,ei^i;k), and zo,i are 
initialized according to the corresponding gamma distributions. Finally, we extend the observa- 
tion equation to capture a significant feature observed in returns on stocks: low returns provoke 



increase in subsequent volatility, see for example Black ( 1976 ) for an early reference. In parameter 



driven SV models, one generic strategy to incorporate such feedback is to correlate the noise in the 



observation and state processes, see Harvey and Shephard ( 1996 ) in the context of the logarithmic 
SV model, and Section 3 of Barndorff-Nielsen and Shephard (2001) for Levy driven models. We 



take up their suggestion, and re-write the observation equation as 



yt = /i + /3vt 



,1/2 



et+piY^ ei J 



+ /?2 ^ e2,j 



^{wpiXi + (1 - W)p2\2) 



(15) 



where e^j- are the system error variables involved in the generation of vt and pi are the leverage 
parameters which we expect to be negative. Thus, in this specification we deal with a model with 
a 5-dimensional state and 9 parameters. 

The mathematical tractability of this family of models and the specification in terms of station- 
ary and memory parameters allows to a certain extent subjective Bayesian modelling. Nevertheless, 
since the main emphasis here is to evaluate the performance of SMC^ we choose priors that (as we 
verify a posteriori) are rather flat in the areas of high posterior density. Note that the prior for ^ 
and Lo^ has to reflect the scaling of the log-returns by a multiplicative factor. We take an Exp(l) 
prior for Ai, an Exp(0.5) for A2 — Ai, thus imposing the identifiability constraint A2 > Ai. We take 
a U(0, 1) prior for w, an Exp(l/5) for ^ and cj^, and Gaussian priors with large variances for the 
observation equation parameters. 

We launch the three models for the S&P 500 data: single factor, multifactor without and with 
leverage; note that multifactor without leverage means the full model, but with pi = p2 = in 
(15). We use Ng — 2000, and is set initially to 100 and then dynamically increases as already 



described. The acceptance rates stay reasonable as illustrated on Figure |3(b)[ Figure |3(c) shows 
the log evidence \ogp{yi;t) for the two factor models minus the log evidence for the single factor 
model. Negative values at time t means that the observations favour the single factor model up 
to time t. Notice how the model evidence changes after the big jump in volatility around time 
t ~ 550. Estimated posterior densities for all parameters are provided in the Supplement. 



4.2. Assessing extreme atliletic records 

The second application illustrates the potential of SMC^ in smoothing while accounting for pa- 
rameter uncertainty. In particular, we consider state-space models that have been proposed for 
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the dynamic evolution of athletic records, see for example Robinson and Tawn ( 1995 1, Gaetan and 



Grigoletto (2004), Fearnhead et al. (2010b). We analyse the time series of the best times recorded 
for women's 3000 metres running events between 1976 and 2010. The motivation is to assess to 
which extent Wang Junxia's world record in 1993 was unusual: 486.11 seconds while the previous 



record was 502.62 seconds. The data is shown in Figure 4(a) and include two observations per year 
y = 2/1:2, with yi < y2' yi is the best annual time and 2/2 the second best time on the race where 
2/1 was recorded. The data is available from http://www.alltime-athletics.com/ and it is further 
discussed in the aforementioned articles. A further fact that sheds doubt on the record is that the 
second time for 1993 corresponds to an athlete from the same team as the record holder. 



We use the same modelling as Fearnhead et al. (2010b I. The observations follow a generalized 



extreme value (GEV) distribution for minima, with cumulative distribution function G defined by: 



G(y|/i,C,cr) = 1 - exp 



1-e 



(16) 



where /x, ^ and a are respectively the location, shape and scale parameters, and {•}+ = max(0, •). 
We denote by g the associated probability density function. The support of this distribution 
depends on the parameters; e.g. if f < 0, 5 and G are non-zero for y > j.i + u /^. The probability 
density function for y = 7/1:2 is given by: 



5(2/i:2|/x,C,o-) {1 - G{y2\^i,i,(T)}W- 



1=1 



g(2/dA^,C,g) 

- G{y^\fj.,£„a) 



(17) 



subject to yi < 2/2- The location fi is not treated as a parameter but as a smooth second-order 
random walk process: 



xt^ {iJLt.f^t)' , xt+1 \ xt,iy Af {Fxf .Q) , F = 



and Q = 



1/3 
1/2 



1/2 



(18) 



To complete the model specification we set a diffuse initial distribution A/'(520, 10^) on /ig. Thus 
we deal with bivariate observations in time 2/t — 2/t,i:2j a state-space model with non-Gaussian 



observation density given in ( 17 ), a two-dimensional state process given in ( 18 ), and a 3-dimensional 



unknown parameter vector, = (i/, ^,(t). We choose independent exponential prior distributions 
on V and a with rate 0.2. The sign of ^ has determining impact on the support of the observation 
density, and the computation of extremal probabilities. For this application, given the form of 



( 16 ) and the fact that the observations are necessarily bounded from below, it makes sense to 
assume that ^ < 0, hence we take an exponential prior distribution on — ^ with rate 0.5. (We also 
tried a A^(0, 3^) prior, which had some moderate impact on the estimates presented below, but the 
corresponding results are not reported here.) 

The data we will use in the analysis exclude the two times recorded on 1993. Thus, in an abuse 
of notation 2/1976:2010 below refers to the pairs of times for all years but 1993, and in the model we 
assume that there was no observation for that year. Formally we want to estimate probabilities 



V 

Pt = 



< 



2/|?/l976:2010j 



G{y\^^t,^)p{^lt\yw^Q■.2ow,0)p{e\ylQ^Q,2mo)d^td9 



e Jx 



where the smoothing distribution p(/it|2/i976:20i0i ^) and the posterior distribution ^(6*12/1976:2010) 
appear explicitly; below we also consider the probabilities conditionally on the parameter values, 
rather than integrating over those. The interest lies in pf|g3^^, ^1993^^ and := p| 



aond „486.11 /„502.62 



which is the probability of observing at year t Wang Junxia's record given that we observe a better 
time than the previous world record. The rationale for using this conditional probability is to take 
into account the exceptional nature of any new world record. 

The algorithm is launched 10 times with Ng — 1,000 and — 250. The resample-move 
steps are triggered when the ESS goes below 50%, as in the previous example, and the proposal 
distribution used in the move steps is an independent Gaussian distribution fitted on the particles. 
The computing time of each of the 10 runs varies between 30 and 70 seconds (using the same 
machine as in the previous section), which is why we allowed ourselves to use a fairly large number 
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of particles compared to the small time horizon. Figure 4(b) represents the estimates at each 
year, for y = 486.11 (lower box-plots) and y = 502.62 (upper box-plots), as well as pj"'"* = 
^486.11^^502.62 (^liddle box-plots). The box-plots show the variability across the independent runs 
of the algorithm, and the lines connect the mean values computed across independent runs at each 
year. The mean value of Plgg^ over the runs is 9.4 • 10"** and the standard deviation over the runs 
is 3.3 • 10"''. Note that the estimates are computed using the smoothing algorithm described in 
Section 13.31 

The second row of Figure |4] shows the posterior distributions of the three parameters {v, ^, a) 
using kernel density estimations of the weighted 6'-particles. The density estimators obtained for 
each run are overlaid to show the consistency of the results over independent runs. The prior 
density function (full line) is nearly flat over the region of high posterior mass. The third row of 
Figure [4] shows scatter plots of the probabilities G'(?;|/i"gg™', 0™) against the parameters 0™. The 
triangles represent these probabilities for y = 486.11 while the circles represent the probabilities 
for y = 502.62. The cloud of points at the bottom of these plots correspond to parameters for 
which the probability G(486.11|Ai"99^™', 0") is exactly 0. 

5. Extensions 

In this paper, we developed an "exact approximation" of the IBIS algorithm, that is, an ideal 
SMC algorithm targetting the sequence iTtiO) — p{0\yi:t), with incremental weight TTt{0) /iTt^iid) = 
p{yt\yi:t-i,d)- The phrase "exact approximation", borrowed fromlAndrieu et al. ( 2010[ ), refers to 



the fact that our approach targets the exact marginal distributions, for any fixed value 
5.1. Intractable densities 

We have argued that SMC^ can cope with state-space models with intractable transition densities 
provided these can be simulated from. More generally, it can cope with intractable transition of 
observation densities provided they can be unbiasedly estimated. Filtering for dynamic models with 
intractable densities for which unbiased estimators can be computed was discussed in |Fearnhead| 



et al. ( )2008) . It was shown that replacing these densities by their unbiased estimators is equivalent 
to introducing additional auxiliary variables in the state-space model. SMC^ can directly be applied 
in this context by replacing these terms by the unbiased estimators to obtain sequential state and 
parameter inference for such models. 

5.2. SMC^ for tempering 

A natural question is whether we can construct other types of SMC^ algorithms, which would be 
"exact approximations" of different SMC strategies. Consider for instance, again for a state-space 



model, the following geometric bridge sequence (in the spirit of e.g. Neal 20011, which allows for 
a smooth transition from the prior to the posterior: 

nt{e)^p{9){p{y^,T\e)r\ ^t=t/L, 

where L is the total number of iterations. As pointed out by one referee, see also |Fulop and Duan| 
(2011 1, it is possible to derive some sort of SMC^ algorithm that targets iteratively the sequence 



TTtie) ^ Pie) {p{yi..T\0)y\ it^t/L, 

where p{yi:T\6) is a particle filtering estimate of the likelihood. Note that {p{yi:T\&)}'^' is not an 
unbiased estimate of {p{yi:T\6)}'^'^ when 74 G (0, 1). This makes the interpretation of the algorithm 
more difficult, as it cannot be analysed as a noisy, unbiased, version of an ideal algorithm. In 
particular. Proposition [2] on the complexity of SMC^ cannot be easily extended to the tempering 
case. It is also less flexible in terms of PMCMC steps: for instance, it is not possible to implement 
the conditional SMC step described in Section |3.6.2[ or more generally a particle Gibbs step, 
because such steps rely on the mixture representation of the target distribution, where the mixture 
index is some selected trajectory, see ([9]), and this representation does not hold in the tempering 
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V 



(f) (g) (h) 

Fig. 4. Athletics records, (a) Best two times of eacli year, in women's 3000 metres events between 1 976 and 
201 0; (b) box-plots over 1 runs of SIVIC^ of estimates of the probability of interest (top) pl°'^-^'\ (middle) pr'"* 
and (bottom) p**'' the y-axis is in log scale and the dotted line indicates the year 1993; (c)-(e) posterior 
distribution of the parameters approximated by SIVIC^, with results of 10 independent runs overlaid on each 
plot and where the full line represents the prior density function; (f)-(h) probability of observing at year 1993 a 
recorded time less than 486.11 seconds (blue triangles, lower cloud of points) and less than 502.62 seconds 
(red circles, upper cloud of points) against the components of 6, where point sizes and transparencies are 
proportional to the weights of the 6i-particies. 
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case. More importantly, this tempering strategy does not make it possible to perform sequential 
analysis as the SMC^ algorithm discussed in this paper. 

The fact remains that this tempering strategy may prove useful in certain non-sequential sce- 
narios, as suggested by the numerical examples of Fulop and Duan (2011 ). It may be used also for 
determining MAP (maximum a posteriori) estimators, and in particular the maximum likelihood 
estimator (using a flat prior), by letting 7^ +00. 
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Appendix A: Proof of Proposition ^ 

We remark first that 'ijjt,e may be rewritten as follows: 

{N^ ~) C t 

n=l J l,s=2n=l 
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Starting from ([t]) and ([s]), one obtains 



^X2/l:t) 



s — 1 l^n— 1 



Pie) 



E 



t-i r 



,8 = 2 1 = 1 



by distributing the final product in the first hne, and using the convention that wi^g{xQ° jX") = 

To obtain ([s]), we consider the summand above, for a given value of n, and put aside the random 
variables that correspond to the state trajectory x".^. We start with x".j(t) = x", and note that 

wt,e{xt-i\x^)qt,e{xt\xt-i) = /e(a;r|a^°-T)5e(ytkr) 



and that 



k i=l ) 

Thus, the summand in the expression of ttj above may be rewritten as 

C Wx 



(xll,(t- 2),xr.,(t- 1))/, (x«,(t)|x?^,(i- !))<?, {yt\yil,m ^q^AA) 



't-l Wx 



Wx 



i-1 C Wx 



^i^hj'(n) 

By applying recursively, for s i — 1, . . . , 1 the same type of substitutions, that is, 
z«,(xr:,(s- l),xl^,(s))g3,,(x^"(^V^i^'^) = fe{^l,{s)K:t{^~l))ge{ys\KAs)), 

and, for s > 2, 

<-"m |E^-l.«(^-2^^:-l)| = K-A^ - 2),xj,(s - 1)) . 



. i=l 



and noting that 



t 

= P{0) n {/« (^iAs)KAs - 1)) 5. (2/.|x?,(s))} : 



s = l 



where p{9, x".j, yi:f ) stands for the joint probability density defined by the model, for the triplet of 
random variables (6, xi;t,yi:t), evaluated at xi;t — x".j, one eventually gets: 



TTt{e,xlf-,a]::^_-,)^p{9\y,A^ 



A^x 



A^x 



JVx 



i=l 



s=2 i=l 



> . 
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Appendix B: Proof of Proposition |2] and discussion of assumptions 

Since p{6\yi:t) is the marginal distribution of ttj t+p, by iterated conditional expectation we get: 



p{yt+i:t+p\yi:t) 



E, 



'p{s\yi:t) 



IEfft.t+p(-|e) {^t+p|t( 



E, 



•p(e\yi 



E 



■.t+p^ "-l-.t+p-lJ f f 



To study the inner expectation, we make the following first set of assumptions: 
(Hla) For all eO, and x,x',x" € X, 



(Hlb) For all 9 e Q, x,x' e X, y e y, 



fe{x\x") 
9eiy\x) 



</?• 



< 5. 



9e{y\x') 

Under these assumptions, one obtains the following non-asymptotic bound. 



Proposition 3 (Theorem 1.5 of Cerou et al. (2011)). For > (35p, 



^t+p\t(^' ^l:t+p> O'l-.t+p- 



l) 



p{yt+l:t+p\yi:t,Oy 



1 < 4/3(5 



_p_ 



The Proposition above is taken from Cerou et al. (2011), up to some change of notations and a 
minor modification: Cerou et al. (2011 ) establish this result for the likelihood estimate Zt, obtained 



by running a particle from time 1 to time t. However, their proof applies straightforwardly to the 
partial likelihood estimate Zt+p\t, obtained by running a particle filter from time t + 1 to time 
t + p, and therefore with initial distribution rjo set to the mixture of Dirac masses at the particle 
locations at time t. We note in passing that Assumptions (Hla) and (Hlb) may be loosened up 
slightly, see Whiteley (2011). A direct consequence of Proposition [Sj the main definitions and the 
iterated expectation is Proposition 2(a) for rj = 4/3(5. 

Proposition 2(b) requires a second set of conditions taken from [Chopin (2002). These relate to 
the asymptotic behaviour of the marginal posterior distribution p{0\yi:t) and they have been used 
to study the weight degeneracy of IBIS. Let lt{0) — logp(?/i:t|6'). The following assumptions hold 
almost surely. 

(H2a) The MLE 9t (the mode of function lt{d)) exists and converges to 9q as n +oo. 
(H2b) The observed information matrix defined as 

* t 8989' 

is positive definite and converges to I{9q), the Fisher information matrix. 
(H2c) There exists A such that, for 6 e (0, A), 



lim sup 



sup 



\>s 



{m - km} 



< 0. 



(H2d) The function It/t is six-times continuously differentiable, and its derivatives of order six are 
bounded relative to t over any compact set 0' C 6. 
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Under these conditions one may apply Theorem 1 of Chopin ( 2002 ) (see also Proof of Theorem 



Chopin 2004) to conclude that £t^+p > 27 for a given 7 > and t large enough, provided 
p = I Tt I for some r > (that depends on 7). Together with Proposition 2(a) and by a small 
modification of the Proof of Theorem 4 in Chopin ( 2004 ) to fix 7 instead of r , we obtain Proposition 
2(b) provided — and r] = 4/36. 

Note that (H2a) and (H2b) essentially amount to establishing that the MLE has a standard 
asymptotic behaviour (such as in the IID case). This type of results for state-space models is 
far from trivial, owning among other things to the intractable nature of the likelihood p{yi;t\0). 
A good entry in this field is Chapter 12 of Cappe et al. ( 2005| ), where it can be seen that the 
first set of conditions above, (Hla) and (H2b), are sufficient conditions for establishing (H2a) and 
(H2b), see in particular Theorem 12.5.7 page 465. Condition (H2d) is trivial to establish, if one 
assumes bounds similar to those in (Hla) and (Hlb) for the derivatives of ge and fe- Condition 
(H2c) is harder to establish. We managed to prove that this condition holds for a very simple 
linear Gaussian model; notes are available from the first author. Independent work by Judith 
Rousseau and Elisabeth Gassiat is currently carried out on the asymptotic properties of posterior 
distributions of state-space models, where (H2c) is established under general conditions (personal 
communication) . 

The implication of Proposition 2 to the stability of SMC^ is based on the additional assump- 
tion that after resampling at time t we obtain exact samples from TTt. In practice, this is only 
approximately true since an MCMC scheme is used to sample new particles. This assumption also 
underlies the analysis of IBIS in Chopin (2002), where it was demonstrated empirically (see e.g. 
Fig. 1(a) in that paper) that the MCMC kernel which updates the 9 particles has a stable effi- 
ciency over time since it uses the population of 0-particles to design the proposal distribution. We 
also observe empirically that the performance of the PMCMC step does not deteriorate over time 
provided N^; is increased appropriately, see for example Figure [l|b) . It is important to establish 
such a result theoretically, i.e., that the total variation distance of the PMCMC kernel from the 
target distribution remains bounded over time provided is increased appropriately. Note that 
a fundamental difference between IBIS and SMC^ is that respect is that in the latter the MCMC 
step targets distributions in increasing dimensions as time increases. Obtaining such a theoretical 
result is a research project on its own right, since such quantitative results lack, to the best of our 
knowledge, from the existing literature. The closest in spirit is Theorem 6 in |Andrieu and Roberts| 
(2009) which, however, holds for "large enough" N^, instead of providing a quantification of how 
large needs to be. 



